Shapley Values: Model-Agnostic Local Explanation Models From a Statistical Viewpoint II

9 min readJul 7, 2021

Without understanding the mathematical expressions of Shapley values, their attributions can be counterintuitive. This blog post focuses on explaining the mathematical expression of Shapley values.

Please see Part 1 of the blog post series for an introduction to model explainability and a discussion of tangent line approximations. This post was co-written with Sahra Ghalebikesabi.

3 Shapley Values

Game-Theoretic Introduction

Imagine you have four players playing a game together: After the order of the players was decided, the first player carries the chunk of wood as long as he can, and then the subsequent players do the same when their turn comes.

Now let’s imagine you want to know which player is the strongest.

Approach 1. Assign an order to the players (i.e. Player 1 goes first, then Player 2, …), and let each player carry the chunk of wood as far as they can, and attribute each player with the distance they carried the chunk of wood, i.e.

Attribution of Player 2 = Value Function after Player 2 played — Value Function after Player 1 played

Such an approach can be unfair since the order can influence the performance of the players. The first part of the path might for instance be the steepest such that the strength of Player 1 would be underestimated.

Approach 2. To decrease the influence of the order of the players, sample it instead repeatedly from a uniform distribution, i.e.

Player 2 goes first, then Player 5, … in a first run,
Player 6 goes first, then Player 1, … in a second run,
… etc.

and note for each order how far each player has come with the chunk of wood.

Ultimately, attribute each player with the expected distance they carried the chunk of wood where the expectation is taken over the order of players, i.e.

Attribution of Player 2 = Expected Value Function after Player 2 played over all orderings — Expected Value Function before Player 2 played over all orderings

Shapley Values for Model Explainability

In the context of model explainability, Shapley values are used to compute feature attributions of a complex machine learning model f at a given instance x_i with only black-box access to the model.

Here, value functions are now a function of the outcome computed by a black-box model. And players are now the features of the data distribution. However, features have no ordering. Therefore instead of choosing an order, we choose if each feature is either “included” in our model, or “removed”.

The model f could for instance be retrained with all possible combinations of features removed/included, resulting in f_1, … f_{2**m}. In this scenario:

Attribution of Feature 2 = Expected f_j(x_i) over all models f_j where Feature 2 is included in training — Expected f_j(x_i) over all models f_j where Feature 2 is removed in training

However, this would make the computation of feature attributions cumbersome. To overcome this computational burden, we define a value function v that approximates feature removal in f.

In other words, for all subsets S of features without j

Note: the renowned KernelSHAP approach (Lundberg and Lee, 2017) describes an optimization procedure for the computation of Shapley values.

Choice of Value Function

How are we going to evaluate our black box f (which has been trained on all m features) on a subset of features S?

To do so, our value function imputes each “removed” feature with a sample from a reference distribution r(X*|x). The choice of a reference distribution is crucial to the interpretation of the resulting Shapley values. So far, r has always been chosen as any global distribution

In other words, the value function v takes as an input a subset of feature values x_S from x_i.

Say m=5, then a possible input would be x_S= (x_2, x_4, x_5). It then imputes the absent features (1 and 3) with values X*_{not S}=(imp1, imp3), and evaluates f on the newly-formed input vector (imp1, x_2, imp3, x_4, x_5).

Finally, as the imputation isn’t deterministic, for a given subset x_S the process is repeated a fixed (user-defined) number of times L resulting in a list of imputed vectors (x_S, X*_{not S, 1})…(x_S, X*_{not S, L}).

Hint The intuition behind all of this is the following: to estimate the impact of a given feature in predicting the outcome of our instance x_i, we “perturb” different combinations of features (equivalent to removing and imputing). Feature attribution j is then calculated by taking the expected difference between the prediction with feature j set to its right value x_ij and the prediction when it’s perturbed. The expectation is taken over all “perturbation scenarios” of the other features.

4 Conditional Imputation vs Marginal Imputation

In this section, we introduce two possible choices of reference distributions: conditional and marginal. To do so, we show with an example how to compute Shapley values according to the two options. Ultimately, we discuss the strengths and limitations of each alternative.

Example: A Hiring Algorithm

Consider a hiring algorithm for a moving company (Merrick and Taly, 2020) that makes the hiring decision of a candidate taking two input features male and strong. However, it is a biased algorithm that only hires men regardless of the applicant’s strength.

with all features and the outcome being binary.

We want to understand why the algorithm would hire a strong male x=(1, 1), and in particular, we want to compute the Shapley value (i.e. the feature attribution) of feature strong.

There exist two subsets without the feature strong: the empty set {} and {male}.

The Shapley value is thus a weighted sum of the two differences:

v(strong, male)-v(male)
v(strong)-v(.)

where v(.) is often taken as the average outcome in the entire dataset.

The coefficients α_1 and α_2 (here both are 0.5) are calculated from combinatory and reflect how likely we are to sample this subset. Further details can be found in the paper on KernelSHAP (Lundberg and Lee, 2017).

Φ (strong, x) = expectation { α_1* [v(strong, male) — v(male)] + α_2*[ v(strong) — v(.)]}

How do we impute in practice?

Let’s consider v(x_strong) for instance.

If we were to choose conditional imputation :
v(strong) = E_{x_male|x_strong=1}[f(x_strong=1, x_male)]

If we were to choose marginal imputation :
v(strong) = E_{x_male}[f(x_strong=1, x_male)]

= E_x[f(x_strong, x_male)|do x_strong=1] *
See paper by Janzing et. al for the equivalence* with do-calculus

Conditional, marginal: what difference does it make?

(a) By choosing marginal imputation :

Limitation 1a: being off the data manifold: we evaluate our model on instances that aren’t plausible (eg. a smoking baby) because they have been created by sampling certain feature values regardless of the correlations between features.
Limitation 2a: sensitive to adversarial attacks: an adversarial is able to distinguish “real inputs” from “imputed one”. We can imagine a scenario where an algorithm behaves unfairly on real inputs (eg. only hiring males regardless of strength), but uses positive discrimination on imputed inputs (i.e. when inputs are detected as “imputed” by the adversary, hiring only females). When being explained, this model would seem fair (gender would get a zero attribution). However, on real-world data, this model would be severely unfair.
Strength a: true to the model. It gives importance to features explicitly used by the model. It first imputes features for which the change in model prediction is the largest

(b) By choosing conditional imputation :

Limitation 1b: Conditional Shapley values are not true to the model (Chen et. al, 2020). Since conditional Shapley values capture data dependencies, attributions are shared among correlated features. If we remove in the earlier example the feature strong from the set {strong, male}, the estimate for strong would be higher for male applicants such that the impact of removing strong is not as high as if it would have been removed marginally.
Strength b: the black box evaluated on the data manifold: contrary to marginal imputation, we do not break correlations. Using a conditional reference distribution allows us to capture the data dependence.

Ultimately, both marginal and conditional distributions suffer under the following limitation:

Limitation 3: All imputed instances stand on an equal footing. When explaining the prediction of a given instance, the prediction of similar instances (slight perturbations) should be more important than the prediction of instances that are very different from our initial instance x_i.

This problem is solved by Neighbourhood SHAP.

5 Neighbourhood SHAP

The main idea behind Neighbourhood SHAP is to weigh the imputed instances (x_S, X*_{not S}) according to their distance to the instance of interest x_i.

As a result:

we keep strength a, and stay true to the model
we overcome limitation 3
by weighing according to the distance, we stay within a neighbourhood of x_i and can remain on the data manifold (see Fig XX). This eludes limitation 1a / enables strength b from conditional imputation
being more “true to the data” prevents adversarial attacks, which cancels out limitation 2a

**Figure 2.** Concatenated data (pink dots) used for model evaluations for the computation of marginal Shapley values (left) and Neighbourhood SHAP (nbrd = 0.1, right) at a randomly sampled instance (maroon dot) where the data manifold is a ring in R2. Even though the background reference (blue dots) lies on the data manifold, marginal Shapley values are evaluated at instances that lie off the data manifold. Image from Ghalebikesabi et al (2021)

To further attest to the added value of Neighbourhood SHAP compared to existing methods, consider the following black-box model f :

where I(·) denotes the indicator function.

When attributing the local feature importance at a test instance x = (x_1, 2), with x_2 fixed at 2, Feature 1 should naturally receive a higher absolute attribution when x1 is closer to the decision boundary at x_1 = 0.

The results on this example from LIME and SHAP as well as from our proposed ‘Neighbourhood SHAP’ approach are shown in Figure 3. Neighbourhood SHAP assigns Feature 1 a smaller attribution, the higher the absolute value of x1 is. The attribution of Feature 1 from SHAP and LIME, however, is constant on either side of x_1 = 0 which illustrates that these methods capture global model behaviour. Ultimately, the result from the local linear approximation to the black box (Botari et. al, 2020) is misleading since Feature 2 receives a significantly positive attribution for x_1 ∈ [−0.4, 0], although it contributes clearly negatively to the model outcome for any negative x_1.

**Figure 3.** Attributions at x = (x1, 2) with x1 varying for a reference distribution of X ∼ Normal(0, 1) and black box f(x) = I(x1 > 0)2x 2 2 − I(x1 ≤ 0)x 2 2 averaged over 10 runs displayed with 95% confidence intervals. While (Tabular) LIME (Ribeiro et. al, 2016) and SHAP assign the same absolute attribution to Feature 1 no matter how large x1 is, our neighbourhood approach takes its distance to the decision boundary into consideration. A local linear approximation to the black box trained with a Ridge Regressor gives misleading attributions to Feature-1 for −0.4 < x1 < 0. Image from Ghalebikesabi et al (2021)

Pseudo-code for Neighbourhood SHAP :

Neighbourhood SHAP can be easily implemented within existing coding frameworks when the data weights can be specified. We for instance consider the SHAP library in the following example

References

S. Ghalebikesabi, L. Ter-Minassian, K. Diaz-Ordaz, and C. Holmes (2021). On Locality of Local Explanation Models. arxiv preprint arXiv:2106.14648.

Lundberg, S.M. and Lee, S.I., 2017, December. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems (pp. 4768–4777).

Janzing, D., Minorics, L. and Blöbaum, P., 2020, June. Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (pp. 2907–2916). PMLR.

Chen, H., Janizek, J.D., Lundberg, S. and Lee, S.I., 2020. True to the Model or True to the Data?. arXiv preprint arXiv:2006.16234.

Merrick, L. and Taly, A., 2020, August. The explanation game: Explaining machine learning models using shapley values. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction (pp. 17–38). Springer, Cham.

Botari, T., Hvilshøj, F., Izbicki, R., and de Carvalho, A. C. (2020). Melime: Meaningful local explanation for machine learning models. arXiv preprint arXiv:2009.05818.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should I trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM (2016)